-
Notifications
You must be signed in to change notification settings - Fork 200
feat: adding the operation delete_all_documents to the OpenSearchDocumentStore
#2321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding the operation delete_all_documents to the OpenSearchDocumentStore
#2321
Conversation
delete_all_documents to the OpenSearchDocumentStore
|
Just a quick thought... I am not sure deleting and recreating the index is a good idea. As a user, I'd expect A bulk delete might be a safer option. Maybe it's also worth checking how this was done in Haystack 1.x. |
|
I was investigating into it and for both Elastic and Open, specially for large volumes of data, this is the most efficient way to do it. In both cases the indexes are recreated with the same mappings/settings. The only issue is if a user changes the index after creation/population. |
|
What about something like: def delete_all_documents(self, recreate_index=False)By default we apply a usual delete operation, and if |
I would prefer this idea! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good to me already. My main suggestion is that if we have a try except in delete_all_documents_async, it makes sense to add the same to the sync implementation of delete_all_documents to ensures consistent behavior.
Other than that, I have a suggestion for the tests to make sure they only test one thing. You could create a separate test, for example test_index_functionality_after_delete_all_documents for testing writing and retrieving of documents after deleting all documents. Something like:
def test_index_functionality_after_delete_all_documents(self, document_store):
"""Test that documents can be written and retrieved after delete_all_documents"""
document_store.write_documents([Document(id="1", content="Test")]) # or add more documents
document_store.delete_all_documents()
new_doc = Document(id="2", content="New test")
document_store.write_documents([new_doc]) # or add more documents
assert document_store.count_documents() == 1
results = document_store.filter_documents()
assert len(results) == 1
assert results[0].content == "New test"
integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py
Show resolved
Hide resolved
integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py
Outdated
Show resolved
Hide resolved
integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍 _prepare_delete_all_request in particular is a good alternative to recreating the index and the default of delete_all_documents(recreate_index=False) makes sense to me.
Related Issues
delete_all_documents()operation toOpenSearchDocumentStore#2315Proposed Changes:
delete_all_documents()method to theOpenSearchDocumentStoreclass to both Synchronous and Asynchronous versionsrecreate_index=False(default): Usesdelete_by_queryAPI for faster deletion of large datasetsrecreate_index=True: Recreates the entire index, preserving original mappings and settings_deserialize_search_hits()→@staticmethod_process_bulk_write_errors()→@staticmethod_deserialize_document()→@staticmethod_postprocess_bm25_search_results()->@staticmethod📝 Usage Example
How did you test it?
test_document_store.py- Addedtest_delete_all_documents()with and without index recreationtest_document_store_async.py- Addedtest_delete_all_documents_async()with and without index recreationChecklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.